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Method and Apparat us for Querying Relational Databases 



The present invention relates to a mefliod for evaluating a query involving a relational data- 
base and a related apparatus and a program. 

A major goal in bioinformatics is to provide search tools wifli which biologic information can 
be retrieved quickly, effectively and completely. Presently there are huge databases storing 
biological data. Apart from that there are, however, a number of smaller databases and also 
electronic data which are not structured like databases, but rather as flat files, e.g. publications 
in scientific joumals. Thus search tools in bioinformatics frequently have to combine data 
from various sources. Since the user wishes to have the result almost immediately on the 
screen, time is of essence in this regard. 

The SRS query language and package has proved to be a powerM instrument in building 
such search tools. SRS allows to combine information from various sources. Principles of 
SRS are set out in WOOO/41094. Basically, SRS works in a two-step process. In a first step 
CTLtries in a data source are identified andm a second step extracted by a parser. Whereas this 
conc^t works well with flat files, the application to relational databases leads into problems 
as the extraction of information related to an identifier, e.g. a key of a table, may take a com- 
paratively long time in a large database. This is due to the feet that the required information 
usually has to be collected from a plurality of different tables which may not be directly 
linked to each other. Current relational database management systems work in the way that 
they establish a joined table for all tables involved in a query. Since the size of the joined ta- 
bls-is-essCT^tially the product of the sizes of the rows of the individual tables, this quickly 
Irads-to large result sets and, accordingly, to long processing times for evaluating these result 
sets. 

It is an object of the present invention to provide a method, apparatus and program for per- 
ft^miing queries in a relational database wherein queries can be handled more easily and are 
easier to imderstand and wherein queries can be processed more quickly and especially to 



provide a method which is compatible to the process of extracting infomiation from other data 
sources, such as flat files. 

According to the invention Has object is accompUshed by a method of evaluating a query m- 
volving a relational database comprising at least one relational database management system 
(RDBMS). said query relating to at least one table of said relational database, said method 
comprising determining a table of said relational database as a gateway table for evaluating 
said query, retrieving by means of said RDBMS one or more unique identifiers of said gate- 
way table related to one or more entries in a table to be queried, using said RDBMS, retriev- 
TBg information from one or more tables to be.queried related to ^d retrieved unique identi- 
fiers of said gateway table and providing a result to said query. This result may be a result set 
in the conventional sense of a result table, comprising Uie retrieved primary keys of the gate- 
way table ia relation to said retrieved information, or an object comprising the result of the 
query, which may be derived from such a conventional result table. 

According to the present invention, a "gateway table" is chosen which forms the starting point 
to the evaluation of the query. In a way it serves as the entry point or "gateway" to the 
evaluation of the query. In accordance with the invention, either prior to the submission of 
said query or during the evaluation thereof, a table is designated as a gateway table, meaning 
that the step of retrieving one or more unique identifiers is performed with regard to this table, 
provided, of course that this table is related to tables referred to in the query. The designation 
of a table as a gateway table may be done in advance, e.g. as part of the system or database 
settings or by means of user settings prior to title submission of the query. The invention may, 
however, also provide that during the process of the evaluation of the query, a table is deter- 
mined as' a gateway table according to suitable criteria and Aat the above-mentioned steps are 
carried out with regard to this table subsequently. In the preferred embodiment, the unique 
identifiers referred to above are primary keys or unique indices of the gateway table. The m- 
vention may, however, also provide that said unique identifier is a combination of indices of 
different columns and. in certain instances, the combination of all indices of a row of the ta- 
ble, which is, per definition, a unique identifier in relational database management systems. 

The invention may provide that when retrieving said one or more unique identifiers of said 
gateway table, in a first step a predetermined index or predetermined indices are retrieved. 
Should it toim out that this index or these indices do not specify a row of the table m a unique 




maimer, a miique identifier is created from such indices according to a, suitable procedure. 
One procedure may be to add indices to the indices retrieved in said initial step until the com- 
bination of said indices is a unique identifier for a row related to an entry in a table to be que- 
ried. Given the case, this may be continued imtil all indices of a row are comprised in the 
combination. Having thus created a unique identifier, it is verified that the related row still 
relates to said entry in a table to be queried. If it does not, the respective identifier is dis- 
carded. Alternatively, one may, for example provide that if the indices retrieved in said initial 
step actually designate a plurality of rows of the gateway table, one only proceeds with the 
first row, thereby accepting a potential loss of information. 

The invention may provide that said relational database comprises one or more predetermined 
hub tables, and said query relates to at least one table of said relational database and wherein 
said method comprises retrieving by means of said RDBMS one or more unique identifiers of 
a hub table related to one or more entries in a table to be queried, using said RDBMS, re- 
trieving information firom tables to be queried related to said retrieved unique identifiers of 
said hub table and providing a result to the query, e.g. as a result set (object or result table) 
comprising the retrieved primary keys of the hub in relation to said retrieved information. 

The invention may provide that one or more libraries are defined on one or more databases. In 
the seoso as used herein a library is defined as a collection of tables which are linked to each 
other and which are not necessarily wiSuii the same database, wherein there is exactly one 
table defined as a hub table. All tables in a library are linked directly or indirectly to the hub. 
Therefore, any entry in a library can be accessed through an entry in the hub and the (direct or 
indirect) relation to the hub. Thus, the hub table can be considered as representing the library. 
That is, a library in the sense of the invention always has one unique entry point or gateway 
for the evaluation of a query, namely said single hub. If a library is exclusively defined on one 
database, it may be viewed as a restricted database. On the other hand, if tables of a second 
database are comprised in a library, the library is, in a way, an extension of a database. Differ- 
ent libraries may share the same tables. Using the concept of libraries according to the present 
invention, it is possible to define concepts at a higher level than the imderlying databases 
which can be tailored to the needs of the application or the wishes of the user without affect- 
ing the underlying structure of the databases. 



Using the concept of libraries, it is apparent that a query in a plurality of databases can be 
carried out without having a hub in each of the databases, if the Ubrary extends over a plural- 
ity of databases. In this case there is only one hub in one database, namely the hub of the li- 
brary (which is, by definition, comprised m one of said databases). 

Thus, whenever reference is made to a hub or gateway table of a database, it should be under- 
stood that this hub or gateway table may be ahub or gateway table of a library defined on said 
database or on a plurality of databases comprising said database. Since a Hbrary is essentially 
a database structure superimposed on the underlying databases, it should be understood that 
- wheneverreferenceismadetoaqueryinadatabase.ortheevaluationofaquery.^^ 
this reads mutatis mutandis on queries in a Ubrary, unless an indication to the contrary is 
givCTU 

A hub table (also called "hub" subsequently) is essentially a predetermined gateway table for 
evaluating queries in said database or Ubrary. As a rule, a query is for a complete set of re- 
lated entries in a searchable entity, i.e. aU information in any table that is related to a uniquely 
identified entry in a table, or a part of such a set If the searchable entit>' is a relational data- 
base, said complete set of related entries comprises all entries directly or indkectly Unked to 
said entry in said database. By defining a higher level entity on databases or across databases, 
such as a library (to be understood in the above-mentioned sense), a complete set of related 
entries is a set of aU entries in said library that are directly or indirectly related to a uniquely 
identified entry in a table comprised in said Ubrary. In many instances, it is neither desired nor 
necessary to provide all information related to a certain entry in a table. In such cases one will 
restrict the query to certain columns of tables or certain tables. In the SQL language, such 
selection can be made using standard syntax. In a preferred embodiment, flie selection of ta- 
bles and/or columns of tables wiU be predetermined by a user or administrator as part of the 
settings of the user interface prior to typing in a specific query. If a Ubrary is defined exclu- 
sively on one single database, a complete set of related entries in the Ubrary will be part of a 
complete set of related entries of the whole database. This means that by defining the Ubrary 
on the database a restriction as to the tables and columns of tables that are to be queried has 
been made so that when a query referring to this Ubrary is submitted no other tables of the 
database will be queried. It should, however, be noted that a complete set of related entries in 
a Ubrary may not be completely contained in a complete set of related entries of a database, 
since the Ubrary may comprise tables of two or more different databases. 



In most instances the query input typed in by a user will consist of query conditions specify- 
ing data to be comprised in the result of the query. For the purpose of this discussion a query 
condition will be considered as a condition in the query that certain elements are to be present 
in certain ^ecified data fields. With regard to a relational database this would mean that cer^ 
tain entries in certain tables have a specified value. For a flat file this would mean that a cer- 
tain sequence of data, e.g. sequence of letters, is found in a certain data field. Although this is 
what the user sees, the query conditions do not form the entire query that is processed by the 
system. In ord^ to return a result, the system needs additional information on which informa- 
tion related to said query conditions ^lall be retumed, e.g. related information in other tables. 
This additional information is contained in settings which may partly or as a whole be deterr 
mined by a usct. The system will generate on the basis of these query conditions and the pre- 
determined settings regarding the tables and columns to be queried one or more query com- 
mands to be submitted to the RDBMS. The query conditions usually impose a condition on 
the rows of the columns to be selected. In such instances, the method according to the inven- 
tion will first look for entries that satisfy the conditions expressed in the query conditions and 
then return one or more hub indices or, in the preferred embodiment, imique identifiers of the 
hub table. 

The invention may also provide that all entries in a column are to be retrieved in the query, in 
which case the query would retrieve all unique identifiers of tixe hub table related to a certain- 
column. 

The invention may provide that said query is for complete sets of related entries of one or 
more databases or libraries or predetermined parts of such complete sets of said relational 
database and comprises one or more query conditions related to said database or library, 
wherein said method comprises: 

identifying a gateway table related to entries specified in a query condition, 
identifying one or more unique identifiers of said gateway table related to said 
entries conforming to query conditions, 

retrieving complete sets of related entries or parts thereof which are related to 
said unique identifiers of said gateway table. 
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In many instances a query wiU relate to different entities, e.g. to two databases or two librar- 
ies, a database or a library and a flat file or multiple databases, Ubraries and/or flat files. 
Likewise it may be that a query relates to tables in one database which are related to different 
hubs of the same database. In all such instances the invention provides, according to a pre- 
ferred embodiment thereof, that those parts of a query that are related to a hub or. in case of 
an eatity other than a library or a relational database, to one or more sub-entities of said sec- 
ond entity are processed separately and the result of the partial searches are combined by us- 
ing relations between the unique identifiers of the hubs, e.g. primary keys of the hubs and 
identifiers of other entities and, given the case, relations between a pluraUty of such identifi- 
ers. In a sunple case, it is determined which unique identifier of a hub. relates to which othor . 
-unique identifier of another hub or to which identifier and the results of the related two partial 
queries are combined. 

The invention may provide that said query involves at least a second searchable entity outside 
said database or outside a Ubrary involved in said query, said second entity comprising sub- 
entities each having at least one identifier uniquely identi^ong said sub-entities, and wherein 

said method comprises: 

retrieving one or more identifies of sub-entities of said second searchable en- 
tity related to said query, especially related to query conditions of said query, 
retrieving one or more unique identifiers, e.g. primary keys, of a hub of said 
relational database or library related to said reeved identifiers of said sub- 
entities, 

retrieving sets of related entries, e^edally complete sets of related entries, re- 
lated to said retrieved unique identifiers of said hub, 

retrieving information firom a sub-entity identified by a retrieved identifier in 
said second entity, 

combining the retrieved information firom said second searchable entity and 
said data base or library into a result set. 

Additionally or alternatively the invention may provide that said query involves at least a sec- 
ond searchable entity outside said database or outside a library involved in said query and 
comprising sub-entities, each sub-entity having at least one identifier uniquely specifying said 
sub-entity, and wherein said method comindses: 




retrieving one or more unique identifier of a hub of said database or library re- 
lated to entries related to said query, especially related to query conditions of 
said query, 

retrieving identifiers of sub-entities of said second searchable entity related to 
said retrieved unique identifiers of said hub, 

retrieving sets of related entries or parts thereof related to said retrieved unique 
identifiers of said hub, 

retrieving information 6x>m said sub-entities identified by identifiers retrieved 
in said second searchable entity, 

combining the retrieved information from said second CTitity and said data base 
or library into a residt. 

A searchable entity in the sense of this ^plication may be a database, a . library and identifiers 
thereof may be primary keys or other unique identifiers of a hub table. 

The invention may provide that said second searchable entity is a second relational database 
or a library and said identifier is a primary key or another imique identifier of a hub table in 
said second relational database or second library, said sub-^tity being a table, a combination 
of linked tables or a part thereof. 

The invention may provide that said second searchable entity is a collection of flat files with 
the sub-entities being flat files in this collection. 

The step of retrieving information related to the retrieved identifier of the second entity and/or 
retrieved imique identifier of the hub may be performed prior or after the step of retrieving 
relations between the identifier of the second entity and a unique identifier or identifiers of the 
hub or hubs, hi the preferred embodiment of the invention, one proceeds, however, by first 
evaiuaiing the identifiers of one entity related to a query condition and then retrieves related 
iS^tifiers of the other entity and uses these identifiers of the other entity as the starting point 
for retrieving the related information in said other searchable entity. If the query conditions 
relate both to the database or library and to the second searchable entity, one preferentially 
proceeds by first retrieving the identifiers of the second entity and the unique identifiers of the 
hub or hubs related to the query conditions, then establishing combinations of identifiers of 
said second entity and identifiers of said hubs which are related to all query conditions, i.e. 



• -s. # 



both the conditions relating to the database or library and to the second searchable entity, and 
then retrieving additional information only for those combinations which are consistent with 
all query conditions. 

The invention may provide that said step of retrieving a relation between identifiers of said 
second searchable entity and unique identifiers of hubs of said database or library comprises 
the step of discarding combinations of unique identifiers of hubs and identifiers of said second 
searchable entity which are not consistent with the query conditions and, retrieving only such 
additional information related to an identifier which is comprised in combinations of identifi- 
ers cdnsistent with the selection parametCTs; - 

The invention may provide that the query relates to tables related to at least two hub tables, 

whereui said method comprises: 

retrieving one or more unique identifieis of a hub table or hub tables related to 
entries satisfying query conditions in tables related to the respective hub, 
retrieving unique identifiers of the respective other hub or hubs related to said 
retrieved unique identifier related to entries satisfying said query conditions, 
retrieving sets of related entries or predetermined parts thereof related to said 
retrieved unique identifiers of said hubs according to the query, 
combming the retrieved information related to said hubs into a result. 

Retiieving unique identifiers of the respective other hub or hubs may involve unique identifi- 
ers which have been found in the first step of retrievmg unique identifiers related to query 
conditions. In such a case, a consistency check is carried out whether the combination of 
unique identifiers meets the query conditions. 

The invention may provide that said step of retrieving a relation between a unique identifier 
of said hub tables comprises the step of discarding combinations of unique identifiers of hub 
tables which are not consistent with the query conditions and retrieving only such additional 
information related to at least one unique identifier of at least one hub which is comprised in a 
combination of unique identifiers consistent with the search parameters. 



Of the two above-mentioned hubs, at least one may be the hub of the library or both of them 
mi^t be hubs of a library. It may also be provided that said two hubs are hubs within the 



same relational database. These can, but do not have to be hubs of two libraries defined on 
this database. One or both may also be a hub which is not relied to a library. ■ 

The invention may also provide that after p^forming a partial query for each hub for said sets 
of related entries or parts thereof the respective results are joined and subsequently checked 
for consistency with the query conditions and, given the case, for redundancies. The invention 
may also provide that an object is created for each partial result and these objects are further 
processed to yield the result of the query. 

The invention may provide that the step of retrieving identifiers of a searchable entity which 
are related to another identifier of a searchable entity is performed on the basis of pre- 
established relations between identifiers of said entities. 

The invention may also provide that the step of retrieving identifiers of a searchable entity 
which are related to another identifier of a searchable entity is performed dynamically during 
the execution of the query. The invention may also provide that said step is performed partly 
on the basis of pre-established relations and partly dynamically. 

Especially, the relation between hubs of a database and an identifier of another searchable 
entity can be established on the basis of a static link or of a dynamic link that is created „on 
the fly**. - 

Retrieving information &om tables referred to m the query and related to a unique identifier 
of a hub may be a comparatively slow process for large databases which is due to the way 
current relational databases work. If a query involving a plurality of tables is input under a 
standard language, e.g. SQL, the RDBMS will form a jomed table of aU tables involved 
which is essentially the cartesian product of all tables. The number of data sets in a joined 
table is essentially the product of the number of rows of the individual tables involved which 
means that the size of the joined table grows exponentially with every fiirther table involved 
in the query. This leads to a rapid decrease of computer speed with the mmiber of tables in- 
volved. 



According to one aspect of the invention this is counteracted by controlling more precisely 
what tables are involved. For the purpose of the fiirther discussion it is helpfiil to use the rep- 
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resentation of a database as a node diagram. A node diagram is to be understood as a graphi- 
cal representation of the database, vfh&t&ai tables are represented as nodes and links between 
tables as lines between the nodes. A link between the tables may especially be a link through 
a foreign key, but is not restricted fliereto. Using the concept of a node diagram, any relation 
between two tables of the database can be visualized as a path between two nodes which is 
either direct (in which case there is a direct link between the two tables) or which is indirect 
and passes one or more intermediate nodes. Using this concept it is possible to control the 
number of nodes involved in a query and to formulate the query in a way that a defined num- 
ber of intermediate tables are involved. 

The invention may provide that in performing said step of retrieving information related to a 
unique identifier of said gateway table, selected tables are queried which, in a graphical repre- 
sentation of the database wherein the tables are represented as nodes and links between the 
tables are represented as lines between the nodes, form a connected graph connecting the 
gateway table to tables referred to in the initial query. 

A fijrttier improvement of this concept resides in that a large query can be split up into a num- 
ber of intennediate queries which are more easily evaluated, because the joined tables in- 
volved are much smaller. Essentially, a link between two tables in said graph is replaced by a 
junction. A junction between two tables means that the two tables are not involved in the 
same query, but that the values of the link keys or anotiier input to the link found in a previous 
query for one table are used as the input for the Imk keys in a query involving the second ta- 
ble. 

The invention may provide that said step of querying tables on said graph comprises per- 
forming consecutive partial queries, wherein a result of a previous query is used as iiq)ut for a 
later query, preferentially the next query along said graph, a first of said partial queries in- 
volving the gateway table and a query other than the first query relating to a table referred to 
in the initial query. 

According to the invention the partial queries are structured such that only tables are involved 
in a partial query which are diiectly linked to another table in said query and that every table 
in said gn^h is mvolved in at least one query. 




Thus, there is a complete chain of queries leading from a hub table to. a table referred, to in the 
query. 

When the relation between the tables is a foreign key relation, the invetitiGn may provide that 
the result of said previous query comprises the vahie of a foreign key of a table involved in 
said later query and whereia said value of said foreign key . is used as input for said later 
query. 

For each partial query, the result is stored to be retrieved later on for combining the results of 
various partial queries to a result or partial result of the initial query. For the sake of clearness, 
the query from which the method according to the invention starts is tenned here and there- 
after as 'Initial query". 

According to a preferred embodiment, the queries are non-overiapping in the sense that, apart 
form the input of a previous query, only tables are queried which are not queried in another 
partial query. 

Splitting up a big query into a plurahty of smaUer queries fbs result of which is subsequently 
merged has the advantage that smaller queries retum smaUer result sets. . Since consecutive 
tables along gr^h are directly linked, there is also a direct jom. between consecutive tables, 
meaning that the joined table of a single partial query is usually not excessively large. How 
many consecutive tables are encompassed by a partial query depends on the relation between 
the tables. Usually, one will determine the various partial queries m a manner that the result 
set is easy to handle and to be purged of redundancies. 

The invention may provide that after each partial query a redundancy check is done on the 
result table. According to one embodiment of the invention, objects without redundancies are 
created on the basis of the result set of the partial queries. 

The stored results may be further filtered or purged depending on the outcome of further par- 
tial queries. For example, if a first partial query returns the value of a key that later on turns 
out not to be related to a table referred to in the initial query, data in the object related to said 
result and comprising said key may be removed from the result set. 
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The invention may provide that each partial query mvolves a table or a pluraUty of tables 
linked to each other and wherein each partial query has as input previously estabUshed values 
of keys of a table, especially link keys. Said link keys link said table or one or more of said 
plurality of tables to another table not involved m said partial query. 

Said values of Unk keys may have been found as the result of a previous partial query. The 
key values used as input may also be values of a key of a gateway table that have been deter- 
mined in a previous step. The first partial query differs in that the gateway table is part of the 
query while values of its keys axe used as input 

The invention may provide that said graph comprises at least one branch node having links to 
at least two other nodes and wherein tables referred to in the initial query are related to sepa- 
rate branches deriving ftom said branch node, wherein a partial query is carried out involvmg 
the table corresponding to said branch node (branch table) and wherein at least one partial 
query is carried out for one or more tables contained in each branch which has the result of 
the partial query involving the branch table as an mpvt 

The respective consequent partial queries for each branch may have further consequential 
partial queries eventually mvolvmg Ihe tables referred to in the initial query. The chain of 
queries for each branch can be evaluated consecutive or parallel to each other. 

If one of the branches mvolves a table related to a necessary query condition, e.g. specified in 
a WHERE clause under SQL, one wiU preferentially evaluate this branch first and thereby 
retrieve those keys of the branch node which are actually related to the entries satisfying said 
necessary condition. In general these will be less than those keys which were retrieved when 
ttie branch node was "passed", i.e. mvolved in a partial query, for the first time. Accordingly 
the result of the first partial query involving the branch node will be reduced to only comprise 
such keys of the branch node. This reduced set of key values will then be used as input for 
one or more partial queries related to the other branch. 

If two or more branches are related to necessary query conditions one may also proceed by 
evaluating each branch separately on the basis of the keys of said branch node evaluated pre- 
viously. The evaluation of each branch will each return as a result those keys of the branch 
node which are related to an entry satisfying a necessary condition. In a consequent treatment 




of the related result only those entries will be retained which comprise keys of the branch 
node related to entries satisfying either condition. 

In a way, a branch table is similar to a hub table in that multiple queries are derived there- 
ftom. In fact, in most instances, the hub table will also be a branch- table; a typical example 
would be a database where the diagranomatic representation of the . database is star-like with 
the hub at the center of the star. Thus, the methods for evaluating the branches deriving jfrom 
a branch node can be similar to tiie methods of evaluating distinct graphs deriving from the 
same hvb. One shoidd, however, bear in mind tihiat a hub table and a branch table are defined 
differently. The difference in the definition of a hub table and a branch table is that the hub 
table is a predetermined entry point for evaluating a basic query and does not necessarily 
mark a branch, whereas a branch table may also be a table that is determined during the 
evaluation of an initial query. 

A method according to the invention may comprise the steps of 

identifying the hub or hubs related to tables referred to in the initial query, 
determining in said graphical representation of said database^ at least for one hub, 
preferably for all hubs, an optimum graph connecting said hub to all tables which are 
related to said hub and which are referred to in the initial query, 
performing a query on said optimum graph. 

It may be provided that evaluating said query on said graph, consecutive.partial queries in- 
volving tables which are consecutive to each other with regard to said optimum graph are per- 
formed. 

The optimum graph referred to above is a graph determined by an optimisation algorithm. 
Optimization algorithms for finding an optimxmi graph connecting given points, such as 
Bijkstra-3 algorithm, are well known in the art. The graph is preferentially optimized with 
r^ardto tihe speed with which the sequence of partial queries can be resolved. One way of 
proceeding is to assign a weight to each link between nodes in the graph and to vary an initial 
graph until it is optimized with regard to the accumulated weights. This weight may be im- 
plemented as a metric and accordingly the search is for the "shortest" path according to this 
metric. 



The concept of using a node diagram to customize the query process and the concept of split- 
ting vp a large query into a number of smaller consecutive queries, especially the concept of 
junctions, are not restricted to the evaluation of queries involving a predetemuned hub table 
and a table related to said hub. In feet, this concept can be used independently of the concept 
of hubs, e.g. in that for a query involving a relational database a gateway table may be defined 
which forms the starting point for the evaluation of the query and from which a path or gr^h 
to the tables involved in the query is established and evaluated, as described above. This 
gateway table need not necessarily be static or predetermined, such as a hub, but it may also 
be chosen with respect to a specific query, even in the process of evaluating the query in a 
way that the graph connecting this gateway table and the tables referred to in the- query is op- 
timized with regard to the speed of the evaluation process, e.g. with algorithms as mentioned 
above. 

For example, a further appUcation of this concept may be the evaluation of links between two 
databases. According to an embodiment of the invention relations between two databases are 
always established through the hubs of the two databases. In order to establish such a relation 
an existing link between two tables of the two databases (hereinafter referred to as "linking 
tables") can be employed to provide a dynamic link between two hubs of the two databases. 
Starting from a hub table in the first database and a certain unique identifier of this hub, the 
linking table in the first database is fi)und and the relation of said hub table to said linking 
table is estabUshed, e.g. using a node diagram of the database. The entries in said linking table 
related to said unique identifier of the hub table and the related entries in the linkmg table in 
the second database are determmed. Subsequently, the unique identifier of flie hub of liie sec- 
ond database related to the entries of said second Unking table are determined. Although this 
procedure could be done in advance e.g. for all primary keys of the hubs involved, whereux 
relations between primary keys of hubs thus found could be statically stored in advance, it is 
presently preferred to perform these steps dynamically during the evaluation of a query. The 
concept of graphs in a node diagram and of junctions can be applied to dynamically execute 
such a Unk. Like it was explained before, a graph connecting the two hubs through the linking 
tables may be established (wherein the link between the two linking tables of different data- 
bases is treated like a normal link in a database) and a number of consecutive queries, starting 
with certain values of the primary key or another unique identifier of the first hub, eventually 
leading to the second hub, is periformed and evaluated. For the sake of completeness it should 




be mentioned that one could also use a mixed ^e of link, where part of the steps are per^- 
formed in advance and part of them dynamically. 

The invention may provide fliat said step of retrieving unique identifier of said gateway table 
comprises: 

determining a table that is referred to in the initial query, 
detennining, in a graphical representation of said database, wherein tables are 
represented as nodes and links between tables as lines between the nodes, a 
gateway table connected to said table, 

querying said database for one or more indices, especially primary keys, of the 
gateway table which are related to said table. 

The invention may provide that one or more specific entries of said, table are implied by a 
query condition and said database is queried for one or more indices of said gateway table 
which are related to said entry. 

The invention may provide that in said graphical representation, a path from said table to said 
gateway table is established and said query for said indices is performed by querying all ta- 
bles correspondmg to nodes in said graph for the values of link keys , between the tables in 
said graph, starting from the table referred to in the query and, given the case, certain entries 
thereof 

The invention may provide that said path is selected as a shortest path between said table and 
said gateway table according to a predetermined metric. 

The mvention may provide that said path is part of or identical to the graph for detennining 
partial queries for retrieving additional information firom tables related to said gateway table. 

Hie invention may provide that, if an index or a group of mdexes related to the same row of 
the gateway table and determined by said step of querying the database does not uniquely 
identify a row of said gateway table, a imique identifier for one or more rows of the gateway 
tables is determined that is related to said indices. 
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The invention may provide that partial quaies used for evaluating the initial query are at least 
partiaUy and preferable completely created dynamically during the process of said evaluation. 

The invention may provide that said result set is represented in an object oriented representa- 
tion. 

The invention may provide that the result of said initial query is expressed as an object de- 
rived by object-relational mapping. In the preferred embodiment, after each partial query per- 
formed in the process of evaluation of the initial query, an object is created by object- 
relational mj^ping and the object representing the result of said initial query is-decived from. - 
these objects relating to the partial queries. In one embodiment this object or objects created 
from the result of a partial query are represented in XML. 

The invention may provide that said evaluation of said query is performed under the control 
of an object manager, said object manager conq)ri8ing a sequence of commands to be exe- 
cuted by a computer system. 

The invention may provide that said object manager handles an object which represents the 
schema or part of a schema of one or more databases to be queried. 

The invention may provide that said object manager defines classes which are dynamically 
created and initiated. 

The invention may also provide a data processing system for controlling the evaluation of a 
query involving a relational database comprising a relational database management system 
(RDBMS), said query relating to at least one table of said relational database, comprising: 
means for determining a table as a gateway table for evaluating said query, 
means for estabUshing a relation between a table or tables to be queried and a 
gateway table, 

means for causing the RDBMS to retrieve one or more unique identifiers of 
said gateway table related to one or more entries in a table to be queried, 
means for causing the BDBMS to retrieve information from tables to be que- 
ried related to said unique identifiers of said entry. 




means for providing or causing to be provided a result to said query, which 
may be a result set or result object comprising the retrieved primary keys of the 
gateway table in relation to said retrieved informationr 

S^d data processing system may comprise means for setting certain tables in said relational 
database as predetermined gateway tables for queries to be evaluated: 

A data processing system according to the invention may, in further embodiments, comprise 
means for controlling the execution of a method as previously described by a data processing 
system or data processing systems. 

The invention also provides a computer program causing a computer or computer system, 
when executed thereon, to perform the steps of a method as previously described and a com- 
puter readable storage mediimi, comprising such a program. 

According to an important aspect of the invention, the method according to the present inven- 
tion uses a standard relational database management system and defines all manipulations of 
queries outside this relational database management system. This allows for a great extent of 
flexibility and m fact creates a platform that can be used independently of the underlying 
system. Surprisingly, it was found that according to the method of the present invention, es- 
pecially evaluating a query according td^iixk concept of graphs and junctions, may lead to a 
significant increase in speed, which can be as high as 50 %, and even higher, if tables with a 
plurality of 1 :N relations are involved so that combinatorial ^lodon sets in. 

Further features and advantages of other mventions will be apparent firom the following de- 
scription of an embodiment of the invention, taken in conjimction with the drawing. 

Fig. 1 shows a node diagram of a simple database referred to in the explanation of a specific 

example. 
Figs. 2A to 2E 

show tiie tables of the database according to Fig.l 
Fig. 3 shows a node diagram for a second example. 
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According to one embodiment of the invention one first performs an analysis of the database 
for which the invention shall be implemented. Essentially one collects information about the 
tables of the database, the columns, their type, their size, then: indices and keys. Analysis tools 
for collecting such information are readily available in the art. For example JDBC (Java Da- 
tabase Comiectivity) or any other RDBMS API may be used to collect such information. 
From this information, a node diagram of the database is constructed in the form of a mini- 
xnum spanning tree using the key relationships between tables. Essentially, for any table one 
looks for a key linking this table to another table (link key) and if there is such a link key. one 
introduces an edge between two nodes representing the two tables. It should be understood 

that of course one does not draw an actual graph, but rather.. ccmstructs an object structure 

which m^s to such a graph. For ease of explanation, reference is made herein to the gr^h or 
elements of the graph, it being understood that such reference is meant to relate to respective 
elements, relations or operations on the corresponding data stmcture. 

UsuaUy, both nodes and edges will be assigned some attributes. For example, one may assign 
a weight to each edge which is used later on for determming the optimum graph, one may 
assign an attribute indicating that the edge represents a 1:1. 1:N. N:N or N:l relationship or 
one could assign attributes representing the presence of indices in particular relations between 
two tables, just to mention a few examples. Attributes assigned to nodes (tables) may, for ex- 
ample, be the name of the columns, tiieir size, or other relevant mformation. 

At tiiis point one may alSo add or change relationships between tables manually where a rela- 
tionship is required. For example, if there is a table Usting tiie autiiors of a scientific pubUca- 
tion and a table relating to contact details, such as e-mail addresses, which table is not or not 
directly related to the author table, one may manually insert a relation between the name of 
the author in tiie one table and his or her e-mail address in the other table. This relation may 
be hnplemented as a modification of the database. It may, however, also be implemented out- 
side flie database as an extramal link between the two tables. 

The result of this step is an object representing the schema of tables. This schema object is 
stored for further use. The exemplary embodiment described here uses the SRS enviromnent 
and under this environment this schema object wiU be stored as an SRS object manager ob- 
ject, for example through fee SRS Java AppUcation Programming Interface (API). The object 
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manager is a tool controUing the evaluation of a query under SRS. Details of the SRS query 
language can, for example, be found at ht^://srs.ebi.ac.uk. 

As a further step, hub tables are defined which shall serve as entry points or gateways to the 
tables in the evaluation of a query. A requirement for a hub is that it comprises unique identir 
fiers, Le. every data set in the table can be umquely identified by an identifier. . Preferably^ 
unique indices or primary keys are used as unique identifies, but e.g. .unique combinations of 
indices may also be used. There are essentially two considerations that can lead to defining a 
table as a hub. One derives firom the user sphere and seeks to provide a focus point in the 
query that is mstructive to the user. The other consideration is more technical and seeks to 
define the table fax a manner that queries can be carried out most efficiently. Thus, the hub 
table may be a table representing information that is a central point of interest. From a techni- . 
cal pouit of view, the hub table should be a table that is likely to be queried and which is pref- 
erentially linked directly to other tables which are Ukely to be queried or related to such table 
in a way that queries involving the hub and such tables can be evaluated most quickly by, the 
RDBMS. One may, for example, think of establishing statistics which tables are most fi-er ■ 
quently queried in a relational database and which combinations of tables are most fi-equently 
queried and choosing the hub table accordingly. The two considerations can frequently be. 
reconciled to each other in fliat a table that is firequently queried is usually also a table of pri- 
mary interest to a user. 

The focus of interest may shift, according to tihie user. Whereas one user may be primarily 
uiterested fai authors of scientific applicajions, another user may primarily be faiterested ui 
certain keywords in' a scientific publication and not or only mildly interested in the author. 
Accordingly, the invention may provide that a user or a database manager can define a hub or 
hubs as part of the user settings or even when typfaig a query. It should be noted that more 
than one hub may be defined in a sfaigle database. This may be usefiil in large databases in 
that-it-allaws smaller sub-stracturesi such as libraries, to be formed which can be used to in- 
creaMf pafarmance in the case of particularly large schemas containnig large data sets. An- 
other reason that can lead to defining a plurality of hubs in a database is that one can in this 
way -define clusters of tables in a database which are known to be queried in conjimction or, 
as far as tables of different clusters are concerned, are known not to be fi-equently queried in 
coryunction with each other. This will later on restrict the number of possible paths between a 
hub and a table to be queried in many uistances and thus contribute to the speed of the appli- 
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cation. The definition of hubs has a stiU furtHer purpose under SRS. By choosing a hub table 
and unique identifiers, e.g. primary keys, as identifiers for entry points to the database, a data 
structure is achieved that can be viewed as centered around a 'liub colunm", meaning that 
access to the data is tough a well-defined identifier. This establishes a similarity to flat files 
and accordingly allows to use procedures of SRS for relational databases which have been 
developed for flat files. 

When a user inputs a query, he usually only specifies keywords he is interested in and gets 
back all information related to said keywords. These may be complete database entries. An 

-cmbodimentaftheinventionmayprovidethatmeuser.canspecifyinthe.^^^ _ 
face which tables or columns of a database he is interested m. This can. for example, be ma- 
plemented by providing tick-boxes which tiie user uses for indicating Ihe information of inter- 
est. Additional information about the extent of the requested information may be implemented 
in the system settings which may or may not be changeable by a user. 

When a user types in a query, the system, according to this example, works in a two-step pro- 
cess in analogy to the standard procedure under SRS for retrieving information. In the first 
step'unique identifim of a hub (or the hubs) are identified which relate to the tables relatmg 
to the query conditions, e.g. Ihe tables containing the k^ords requested by the user. In a 
second step, all information is retrieved that relates to the query conditions, to the extent such 
information has been specified in the system settings or by a user iapMt. 

More precisely, m the first step an analysis of the query is carried out with a parser, e.g. with 
an ICARUS parser. The result of this analysis could be a binary tiree representing a hierarchi- 
cal analysis of the query and specifying at a higher level the database, at a lower level the 
identifier field, e.g. "author" and at an even lower level the keyword required, e.g. "Smith", 
■me system now maps this query to the node graph of the database stored in the system. To 
stay with the above-mentioned example, the system verifies whether tiiere is a table m the 
specific database uniquely identifymg authors and the column in which the name of tiie autiior 
should occur. If such a table is found, the system verifies whether and whichhub is related to 
ibis table in said node diagram. It tiien generates a query for unique identifiers, e.g. primary 
keys of the hub table related to tiie entry in said table meeting the query conditions. Since it 
has been previously established that the hub is related to the tables in the input query by at 
least one path, tiiere should be at least one unique identifier of tiie hub related to die entry. 



la a specific embodiment, the system establishes a path in said node diagram from the table to 
be queried to the hub table and, in querying tfie unique identifier of the: hub, also queries for 
the keys of the intermediate tables (nodes) in the path related to the entry specified by the 
query condition. In this way, a comqplete set of relations to the entries meeting the query con- 
dition, including the unique identifier of the hub, is established. 

Assuming that there is only one hub and one database involved in the query, the system now 
establishes a graph or tree in said node diagram having the hub table as: its origin wherein all 
tables referred to in tiie query appear as nodes. This graph is preferably a graph without any. 
loops, i.e. there is only one path along the graph from the hub table to any table referred to in 
the query. The graph may have several branches originating in the hub and each brancb.may 
contain further branches. Ideally, the part of the graph between tables referred to in the query 
should be minimal. This is, however, not to be understood in a graphical sense, but rather .in 
the sense of a metric defined on the graph and used for the optimization algorithm. For exam- 
ple, the evaluation of a query corresponding to a graph comprising a plurality of intermediate 
tables with 1:1 relations between tiiem may be faster than the evaluation of a query corre- 
sponding to a graph with one single intermediate table having a 1:N relation to its neighboring 
tables along the graph and tibius be "shorter"* in the sense of a related metric. 

Having established the optimum graph, the graph is evaluated in that a query is made for all 
entries in tables referred to in the initial query that are related to fixe values of the imique 
identifier identified in the first step. More ^ecifically, the system creates an object plan for 
the query that consists of classes and partial SQL queries which are created dynamically. The 
information necessary for creating and instantiating classes and partial queries is held in a 
data structure called '^turntable" which contains all the information needed to execute a query 
with different initial inputs, i.e. different values for the hub identifiers. The object plan is cre- 
ated according to the schema and information passed as to the graph for this query and may 
involve the optimization step. Once this object plan has been created, it is given initial infor- 
mation, i.e. the values of unique identifiers of the hub and subsequently run. In evaluation of 
the query, the system will query for the value of all entries of the tables represented by nodes 
on the branch which are related to the primary key of the hub. As pointed out previously, for a 
large number of tables along a branch this can, if done in the usual way, lead to large result 
sets with a lot of spurious entries. According to the invention, the query involving the tables 
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of a partial graph along a branch is spUt up into a plurality of consecutive queries of tables 
along the grq>h, wherein the respective numbers of tables involved in the partial query is 
smaller than the number of tables in the query that would involve the entire branch. These 
multiple queries are sequential to each other in that the result of prior queries is stored and at 
least partly used as an input for at least one subsequent query. 

The technique of connecting two consequent queries along a graph in that the result or part of 
the result of the first query is used as input to a consequent query, is referred to as a junction 
herein. A junction means that two tables corresponding to nodes in the graph which are di- 
rectly connected to each other are not involved in.the same query.. For each .pair..of . tables, cor- 
responding to directly connected nodes along the graph, there is a link, usuaUy a link key, 
establishing the relation between the two tables. Using a junction, input to the link, usually the 
values of the link key established in the evaluation of a previous query involving one of the 
tables will be used as the input for a subsequent query involving the other table. This means 
that there will be no join involving these two tables and, accordingly, the result sets of each 
partial query wiU. be smaller. 

After each partial query it is determined v^^ether akey determined in aprevious step is related 
to a subsequent table along Ihe gr^h at all. If it is not, the respective key value will be re- 
moved from the result of Ihe partial query. Additionally, filtering and a check for redundan- 
cies and/or consistency may be carried out for the result of each partial query prior to storing. 
A redundancy check may, in one embodiment, start from Ihe columns relating to the tables 
along the part of the graph currently queried which are most distant to the hub. If an identical 
combination of keys turns out at this level and therc is no N:l or N:N relation along this part 
of the graph, it means that there is a redundancy and the respective entry is removed. Other 
techniques of redundancy checking may. of course, be applied. Since only a small result set is 
to be created and filtered in each step, such filtering, redundancy checking etc. may be done 
more quickly and efficiently than the redundancy check and filtering of a large result set 
which would result if all tables referred to in the initial query were queried at once. The result 
of each partial query, having redundancies and/or inconsistencies removed, is stored as an 
object and information is stored that is needed to retrieve said object. 



in evaluation of a branch of the gr^h, a fiir&er branch is identified, subsequent queries 
along each branch may be carried out separately, eiflier parallel or consequential to each 




other. If the subsequent branches are relatively short, one may, however, also choose to in- 
volve the tables along these subsequent branches, given the case, together with the table cor- 
responding to flie branch node, in one single partial query. 

Having evaluated a branch entirely in this manner, the results of the partial queries are com- 
bined and again checked for redundancy and consistency. Having evaluated each branch 
originating j&om a hub, one has obtained a result set for each branch, each identified by the 
value of the unique identifier of the hub. The results for different branches are then merged 
and again checked for consistency and redundancies to obtain the final result. 

This result is expressed as an object struchure, e.g. an object structure as is provided in the 
firamework of SRS. Defining this object stmcture or objects representing results of partial 
queries, one can make use of dynamic classes, i.e. classes that are defined while processing 
and then instantiated to create objects. Dynamic classes per se are not new and have been 
known in ttxe language Smalltalk, but are not available in cxirrent languages such as C^. or 
Java which have only static or compile time classes, SRS supports dynamic classes, although 
previous applications did not make use thereof. 

According to an important aspect of this embodiment, all queri& created dming the evalua- 
tion of the initial query are dynamically created while processing. Li other embodiments, par- 
tial queries along certain partial graphs between hubs and tables which are fi:equently asked 
for may be determined in advance and optimised. It can also bfe contemplated to detemune 
certain partial routes leadmg to a "cluster'' of tables in advance, wherein the final part of:the 
route to the required table is dynamically determined ("on the fl/'). A Gsxthet optimisation 
technique may be disregarding some cotmections in the graph of a database. 

The above-mentioned two-step process can be viewed as resolving the query conditions in a 
first step and retrieving information related to the result thereof in a second step. In one em- 
bodiment of the invention the tables comprising the entries which meet the requirements of 
the query conditions are not necessarily part of the partial queries performed in said second 
step, e.g. if the information to be extracted fi-om the table is restricted to the value of the en- 
tries relating to the query condition. It may also be provided that already in the first step all 
information related to the entries specified by the query conditions in the same table is ex- 
tracted and stored so that there is no need to "revisit" the related table later on. 



The above-mentioned two-step process is not necessary. One may imagine an embodiment 
wherein the system retrieves additional information when retrieving tiie unique identifiers of 
the hub, starting from the table referred to in tiie query condition. One may, for example, pro- 
vide in querying for tixe .primary keys of the hub, tiie system queries also for tiie keys of 
tiie intermediate tables in a pafli starting from tiie entiy specified by tiie query condition to tiie 
related hub and, in subsequent queries, retiieves information related to tiiese keys. 

Previously, a description was made for tiie case fliat only one hub or only one database or 

library is involved. — • - - 

Cases where tiiere is a plurality of hubs involved, e.g. in cases of a pluraUty of databases, a 
pluraUty of libraries or a plurality of hubs m one single database, are treated in such a manner 
tiiat tiie initial query is spUt up into a number of partial queries, part of tiiem corresponding to 
graphs having hubs as end points and tiie remaining part of tiiem containing only one hub. 

m tiie case of two hubs, one may. in a first step, retiieve tiie values of tiie unique identifiers of 
the hub tables related to tables which have entries meeting a query condition, as descnbed 
above, said relation to said tables not involving tiie respective otiier hub. Subsequently and m 
a seccind step, one will determine an optimum graph which consists of partial graphs (corre- 
sponding to a partial query) related exclusively to one single hub and not involving any otiier 
hub and one or more partial gr^h (corresponding to one or more forttier partial queries) con- 
necting ttie hubs to each otiier. One will tiien evaluate for each hub tiie partial graph exclu- 
sively related to tiiis hub and store tiie respective results. One will flien evaluate tiie query or 
queries corresponding to the partial graph or graphs between tiie hubs. THe result of tiiese 
partial graphs is tiien combined to yield tiie result for tiie entire query. 

Alternatively, one could apply tiie concept of junctions for evaluating tiie link between tiie 
two hubs and introduce a junction behind each hub in tiie partial graphs comiecting two junc- 
tions and, given ttie case, at furflier locations m tiiese graphs. Evaluating tiiese partial graphs 
linking the hubs, one starts witii tiie values of a unique identifier of one hub and evaluates a 
query for related values of unique identifiers of tiie otiier hubs, which can be done in tiie same 
mamier, especially using junctions, as was described before for tiie case of tiie evaluation of a 
query involving a hub and related tables. As a result, one will obtain a set of values of tiie 



ludque identifiers of the other hubs which is then checked for consistency with the query con- 
ditions. This way of proceeding is especially advantageous if the query conditions only relate 
to tables related to one hub. In this case, one will only retrieve primary keys of this hub in the 
first step and the other hub is effectively treated like a branch point in a graph witii a jimctioti . 
introduced histead of the last link on the graph connecting to the second hub. 

Depending on the nature of the hubs and on whether the hubs are. predefined in the system 
setting or chosen by a user, one may contemplate to store predetermined relations between the 
primary keys of a plurality of hubs in one single database. Conceptually, this would mean that 
one introduces in the node diagram a direct link between the two hubs which would be used 
instead of the dynamic link described above. 

Another possibility that may be contemplated is retrieving the primary keys of all hubs in one 
query involving all query conditions in the first step of the procedure, :thereby obtaining the 
relevant combinations of primary keys. This may be a sensible way of proceeding especially 
in cases where there is a direct link, especially a 1 : 1 link, between the hubs. 

Cases where more than one database is involved can be treated in a manner similar to the case 
of a plurality of hubs. Assuming, for the sake of simplicity, tbst each database comprises only 
one hub and that the databases are linked to each other through linking tables, one in each 
database, one can consider flie link between the two databases through the two linking tables 
as a normal edge in a node diagram. Representing fee two databases in such a manner, there is 
conceptually and as far as the method of evaluating a query according to tiie present invention 
is concerned, little differ^ce to the case of two hubs in a single database. That is, after re- 
trieving the related unique identifiers of the hubs on the basis of the query conditions, as de- 
scribe before, one will determine a graph through the two databases involving the hubs of 
the two databases and any other table referred to in the query. This graph will comprise a par- 
tial graph exclxisively relating to the hub in one database, a second graph exclusively related 
to file hub of the other database and a partial gr^h connecting the hub of the first database to 
the hub of the second database and extending through said two linking tables. It should be 
noted that the linking tables are not necessarily the hub tables. . 



Again, the evaluation of the query is analogous to what was described before. One will evalu- 
ate the partial query related to the first and second hub, putting junctions between the three 
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partid graphs and evaluating the B>»»e between the two hubs m the manner described. It 
should be noted that in evaluating fte graph between Ure two hubs one e«. again use the con- 
cept of junctions, if there is no direct Unk betwe«. a» two hubs. One will, in the case of two 
tabs determine an optimum path ttaough the node diagram flrom one hub to the .to and put 
talons be^^een certain nodes, thus defining partial queries which are evaluated separately 
and the results of which are eventually combmed with e«>h oto to render a partial result for 
.his partial graph comtecting the ^vo hubs which is thea merged with the results of the re- 
maining two partial graphs relating to respective hubs. 

- ttthe.p«y conditions rdate only to one database and the»cond databasc-is-only needed^to 
obtain infermation related to «.tries meetmg said query conditions, one may proceed, as de- 
scribed before for the case of two tabs in one single database, by retrieving the primary k^s 
of the tab of the first database related «. the «.tries of the query conditions, then evaluate the 
aynamic Unk «> the hub in flie ofter database and. d^ending on the value of ihe primary keys 
of fte hub of ae second database r«ri«redby the evaluation of flus Utik. evaluate the part«l 
graph deriving ftom Ok hub of said second database. 

The above-mentioned c««ept of spUtting a query hrto several partial queries, each involvmg 
ahub can also be appUed to evahtate queries which mvolve a relational database and anoth^ 
searchable entity which comprises an identifier which allows for idenBflnng a sub-enrity »k1 
extractmg dat« ftom said sub-entity, especially n flat file. If a rela.iot«l database and a flat file 
are involved in the same query, one can spUt the initial query m three parts, n^nely in a part 
involving the hub and tables related thereto m the rdational database, a part that invoh^s «- 
tracting information ftom the flat file and a part that relates to the link be^»een the m file 
and «>e relational database. In a way. one could treat a flat file wiMn the concept of thts m- 
vention in a node diagram like a hub that has no finther tables related thereto. 

For further iUustmtion of the principles of the present mvention. a somewhat shnplistic exam- 
ple wiU be given with reference to the configuration schematically sketched in Fig. 1. Ftg. 1 
Iws a reladonal database which comprises the tables KESEARCHERS, DBPAKTMEKTS. 
ARTICLES AUTHORS and TITLE/ABSTRACT. There may be further tables wta<* are 
indicated by dotted Unes in Fig. 1. The table RESEARCHERS has colmnns RNAME and RI 
(-researchers identifier), the latter definmgtheprimary key for this table. Rl fomrs ahnkkey 
to the table DEPARTMENTS, RNAME fcm>s a link to cohmm RNAME of table 



AUTHORS, which also comprises the columns ANAME and ARTID..ARTID in turn fonns a 
link to the table ARTICLE. The table ARTKXE comprises the columns ARTID, JOURNAL 
YEAR and PAGE. ARTID also fonns a link ftom table ARTICLE to the tables 
TITLE/ABSTRACT with colunms TITLE and ABSTRACT. In this example* the table 
RESEARCHERS is predefined as the hub table, e.g. because the user of the database is an R 
& D institution wishing to monitor the research by its employees. 

Let us now suppose that a query is submitted for all articles mentioning "insulin" in the ab- 
stract and simultaneously requesting to retrieve bibliographic data about the article, the title 
and file name and the department of the respective author. The system will first look in the 
table ABSTRACT/TITLE in column ABSTRACT for entries comprising the word "insulin". 
Subsequently, the system will check whether there is a relation between the table 
ABSTRACT/TITLE and the hub table RESEARCHERS. As there is, through the chain of 
tables ABSTRACT/TITLE - ARTICLE - ANAME - RESEARCHERS, the. system- proceeds 
with retiieving the keys RI related to entiies in the column ABSTRACT in table^ 
ABSTRACT/TITLE containing the word "insulin". It turns our that there are two vidues of RI 
that match this condition, namely the values 1 and 2 (corresponding to the researchers Smith 
and Jones). 

The systan now checks from which tables information is requu-ed. Hiese are liie t^les 
RESEARCHER, DEPARTMENTS, ARTICLE and ABSTRACT/TITLE. Accordingly, the 
systotn will now determine a graph connecting these tables to ihe hub table RESEARCHERS, 
as shown by solid lines in Fig. 1. In a first step, the system will then evaluate all entries m the 
table AUTHOR having flie names Smith and Jones (corresponding to the RI I and 2; . for the 
sake of simplicity it is presumed that the names are um'que identifiers for the researchers) and 
retrieve related values of ARTID. This result is stored as a first partial result set. One will 
note that the subsequent tables ARTICLE and ABSTRACT/TITLE were not queried in this 
step, meaning that a junction was created between the tables AUTHOR and ARTICLE. The 
key ARTID retrieved in this first step is now used as the input in a subsequent query to re- 
trieve the bibliographic data, the title and the abstract. Reviewing the result of this query one 
will note that it contains articles 3 and 4, because Smith and Jones are listed as authors for 
these articles. These articles do, however, not meet the query condition. The system therefore 
chedcs the partial result of this second partial query for consistency and removes those parts 
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of rcsd. set which is not oon^^ wi* the ^ con<U«o. (abstract cootams the word 
-i^").Thisl»avesaiesatt«tasaho™belowvafharticlcID8l andZ. 




one will note ^ ^ axticle ID 1 appears t^cc. be«».se Smith and Jones are co-authors of 

. article 1. — - ' ~ " 

Havingnow evaluated the flr.. branch of .he graph, thcresultotwhi^isaho^inthe^^ 
1 *ove the sy^teor now nnns to evaluaUng a.e second branch eompr^nrg the hub 
and .he .ble DEPARTMENTS. As there are only two t*.es nrvohred nr 
^^thesystenr removes. wi.ou.U>eneedforathr«>eriun^on,.e^^ 

M 1 and 2 conespond to Snndr and Jones and fta. Smlfl. is wo,ta,g m departoen. A ^ 
"nLtv^AinsiIdep.rnnen.B.Tl.sresnl.isnowcon*inedwim*eprevionsres„«^ 

that the entire result set looks as shown below. 
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^ ^ ^w essentially comple«d For the safce of shnpUcity *e .esul. was represented 
rr:l table. Accordh>gtoapre,^«.*odin.en,«„results of the ^^^^ 
would have been converted hrto object which would .Iten have been con*m«. to yteld the 
result to the entire query. 

^. ^ now assun^e O-at. in a di^ setting of the dat^ase. bo* *e tab^ 
^CHERS and ARTICLE have been chosen as hubs. In this case, fcr evaluawg ^ 
^r*e syste. would establish in a «rs. step «ra. ARTID 1 and 2 ™«:h the 
Iinbstrac certains «>e word "insulin-). The systenr would then in a firs. s.ep evata- 




ate the partial graph comprising the tables ARTICLE and ABSTRACT/TrTLB and retrieve 
the bibliographic data, title and abstract corresponding to articles 1 and 2. In a second step, 
the system would, starting from the values 1 and 2 of ARTID, determine tihie related RIs of 
table RESEARCHERS through the chain of tables ARTICLE - AUTHORS 
RESEARCHERS. This could be done in one single query- One could also contemplate to in- 
troduce a junction between AUTHORS and RESEARCHERS, Le. first evaluating a partial 
query for keys of AUTHORS, which is then used as the input for a further query for values of 
RI. As a result, one retrieves the values 1 and 2 for RI. In a third partial query involving, the 
hub table RESEARCHERS and the table DEPARTMENTS are, retrieved- The result sets of 
the partial queries are combined, which will result in the same result set for the initial query as 
shown above. 

If one views the chain of tables ARTICLE - AUTHORS - RESEARCHERS as: a,link be- 
tween the two hubs, one understands that the same procedure could be applied if the table 
RESEARCHERS and ARTICLE belonged to different databases or different libraries and 
were connected through a link between the tables AUTHORS and RESEARCHERS. It should 
be understood that the link between the databases does not have to be to the hub table of one 
of the databases or libraries. 

In a modification of the above example of two hubs in one database, one could also in a first 
step determine the values of the primary keys of both hubs meeting the query- condition. This 
would return the RIs 1 and 2 and Hie ARTIDs 1 and 2, which could then be used to evaluate 
the two branches of the graph involving the respective hub table and the tables depending 
therefrom. As a fiirther alternative, one could choose to implement a direct static link between 
ARTICLES and RESEARCHERS. 

In a fiirther example^ the advantage of involving junctions is shown with reference to the da- 
tabase shown in Fig. 3. This database comprises three tables, shown below, with the table 
"Articles Table" being defined as the hub table and ArticlelD as the unique identifier of said 
hub table. 
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C»,idaring a ^axch for all mfcnna«ion related «> Ardde ^ 88, two - J^";^ 
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(Articles) and each of the related tables (RefereKea «»J AaAon) 



and sotted. result set wOl have 3 n^ult^Ued by 2 - 6 
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Single Result Set Returned 

Theredm^^ttdatawmbeignoredd^rrngthefllteringprocessandanobjectcot^ 

theumqueoaia. inuu f„^T,^fpi RefP2 RefP3 and the Smith and 

even though it appears 6 times. The same apphes for RefPl . RefP2, RetF 




Jones redundancies. Each unique field is only read once owing to the filtering and object 
creation algorithm. Expressing the search result as an object or a structure of objects, it is pos- 
sible to avoid redundancies in the result This bdng so, it would, of course, be preferable to 
avoid redundaudes in the first place. This can be done with the concept of junctions. 

Case 2: Both links are junctions 

Two queries are constructed and submitted either sequentially or in parallel. Two result sets 
are returned which are shown below. 
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Result Set 2 



The redundant data is ignored (only "88" and "Title88'* in this case). An object is constructed 
from these two result sets. This object is equivalent to the object constructed in Case 1. 

One will note ttiat with the singjle query, the result set is larger and contains more redundant 
mformation. The number of returned rows will dramatically rise as the number of tables 
(witihi ottierthan 1-1 relationships) increases. Complex relationships are^lhe usual situation in 
most real-world databases. 

It slKiuid be understood that the above examples are merely for the^purpose of illustration and. 
thatihe-invention is in no way limited to Ihese examples. 

The features of the invention disclosed in the specification, the claims and the drawings can, 
bofe-singly, as well as in any arbitrary combination, be material for the realization of the in- 
vention in its various embodiments. 
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1. Method of evaluating a query involving at least one relational database comprising a 
relational database management system (RDBMS), said query relating to at least one 
table of said relational database, 

said method comprising 

determining a table of said relational database as a gateway table for evaluating said 
query, 

retrieving one or more unique identifiers of said gateway table related to one or more 
entries in a table to be queried, 

retrieving information firom one ore more tables to be queried related to said retrieved 
unique identifiers of said gateway table, 
providing a result to said query. 

2. Method according to claim 1, wherein said relational database comprises one or more 
predetermined hub tables and said query relates to at least one table of said relational 
database and wherein said method comprises: 

retrieving one or more unique identifiers of a hub table related to one or more entries 
in a table to be queried, 

retrieving uiformation firom tables to be queried related to said retrieved unique identi- 
fiers of said hub table, 
providing a result to said query. 

3. Method according to one of claims 1 or 2, wherein at least one library is defined on 
one or more of said databases, said library consisting of tables linked to each other and 
having exactly one table defined as a hub table. 

4. Method according to one of claims 1 to 3, wherein said query is for complete sets of 
related entries of said relational database or of a library or for parts of such complete 
sets of related entries and comprises one or more query conditions related to said data- 
base or Ubraty, wherein said method comprises: 

identifying a gateway table related to entries specified in a query condition. 
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. identifying one or more imique identifiers of said gateway table related to said 
entries conforming to query conditions, 

retrieving complete sets of related entries or part thereof which are related to 
said unique identifiers of said gateway table. 

Method according to one of claims 2 to 4, wherdn said query involves at least a sec- 
ond searchable entity outside said database or outside a Ubrary involved in said query, 
said second entity comprising sub-entity each having at least one identifier umquely 
identifying said sub-entities, and wherein said.metiiod comprises: 

...retrievingoneormore identifier of sub-entities of said. s^^ 

tity related to said query, 

retrieving one or more unique identifiers of a hub table of said relational data- 
base or Ubrary related to said retrieved identifiers of said sub-entities, 
retrieving sets of related entries or predetermined parts thereof related to said 
retrieved unique identifiers of said hub table, 

retrieving information from a sub-entity identified by a retrieved identifier in 
said second entity, 

combining the relrieved information from said second searchable entity and 
said database or library into a result. 

Metiiod according to one of claims 2 to 5, wherein said query involves at least a sec- 
ond searchable entify outside said database or outside a library involved in said query 
and comprising sub-entities, each sub-entity having at least one identifier umquefy 
specifying said sub-entity, and wherein said method comprises: 

retrieving one or more unique identifiers of a hub table of said database or h- 

brary related to entries related to said query, 

retrievmg identifiers of sub-entities of said second searchable entify related to 
said retrieved unique identifiers of said hub table, 

retrieving sets of related entries or predetermined parts thereof related to said 
retrieved unique identifiers of said hub table, 

retrieving mformation from said sub-entities identified by identifiers retrieved 
in said second searchable entify, 

combining the retrieved information from said second entify and said data base 
or library into a result. 
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7. Method according to one of claims 5 or 6, wherein said second searchable entity is a 
relational database or a library and said identifier is a unique identifier of a hub table 
in said relational database or library. 

8. Method according to one of claims 5 or 6» wherein said second searchable entity is a . 
collection of flat files with the sub-entities being flat files in this collection. 

9- Method according to one of claims 4 to 8, wherein said step of retrieving a relation 
between identifiers of said second searchable entity and unique identifiers of hubs of 
said database or library comprises the step of discarding combinations of identifiers of 
hubs with identifiers of said second searchable entity which are not consistent with the 
query conditions and retrieving only such additional information related to an identi- 
fier which is comprised in a combination of identifiers consistent with the selection pa- 
rameters. 

10, Method according to one of claims 2 to 9, wherein the query relates, to tables related to 
at least two hub tables, wherein said method comprises: 

retrieving one or unique identifiers of a hub table or hub tables, said identifier 
being related to entries satisfying query conditions in tables related to the re- 
spective hub, 

retrieving unique identifiers of the respective other hub or hubs related tp said 
retrieved unique identifiers related to entries satisfying the query conditions, . . 
retrieving sets of related entries or predetermined parts thereof which are re- 
lated to said retrieved unique identifiers of said hubs accordmg to the query, 
combining the retrieved information related to said hubs into a result 

IL Method according to claim 10, wherem said step of retrieving a relation between 
unique identifiers of said hub tables comprises the step of discarding combinations of 
xmique identifiers of hub tables which are not consistent with the query conditions and 
retrieving only such additional information related to a unique identifier which is 
comprised in combinations of unique identifiers consistent with the search parameters. 
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Meliiod according to claim 10 or 11. wherein at least one of said hubs is the hub of a 
library and the query relates to said library. 

Method accordmg to claim 12. wherein the query relates to two Ubraries and said hubs 
are hubs of two libraries. 

Method according to one of claims 10 to 13, wherein said two hubs are hubs within the 

same relational database, 

Melhod according to one of claims 4 to 14, whereinihe step of rebieyingmdque ide^. ... 
tifiers of a hub table and/or identifiers of a searchable entity which are related to an- 
oliier unique identifier of a hub table and/or identifier of a searchable entity is per- 
formed on the basis of pre-established relations between identifiers of said entities. 

Method according to one of claims 4 to 14. wherein fee step of retiieving unique iden- 
tifiers of a hub table and/or identifiers of a searchable entity which are related to an- 
other mrique idmtifier of a hub table and/or identifier of a searchable entity is per- 
formed dynamically during the execution of the query. 

Method according to one of claims 1 to 16, wherein in performing said step of re- 
trieving informationrelated to aunique identifier of said gateway-table, selected tables 
are queried which, in a gr^hical representation of the database wherein the tables are 
represented as nodes and links between the tables are represented as lines between the 
nodes, form a comiected graph comtecting the gateway table to tables referred to. in the 
initial query. 

Metiiod according to claim 17. wherein said step of querying tables on said gr^h 
comprises perfomiing consecutive partial queries, wherein a result of aprevious query 
is used as input for a later query, a first of said partial queries involving the gateway 
table and a query other than the.first query relating to a table referred to in the mitial 
query. 
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19. Method according to claim 18, wherein said result of said previous query comprises 
the value of a foreign key of a table involved in said later query and wherein said value 
of said foreign key is used as input for said later query. 

20. Method according to claim 18 or 19, wherein the result of said partial queries is stored 
as an object or objects. 

21. Method according to one of claims 18 to 20, wherein after each partial query a. redun- 
dancy check and/or a check for consistency is carried out in the respective result and 
the result purged of redundancies and/or inconsistencies is stored. 

22. Method according to claim 21, wherein said redimdancy check is carried out in creat- 
ing said object comprising the result of said partial query or directly on said object af- 
ter creation of the same. 

23- Method according to one of claims 18 to 22, wherein each partial query involves a 
table or a plurality of tables linked to each other and wherein each partial query other 
than the first has as input previously established values of link, keys, said link keys 
linking said table or one or more of said plurality of tables to another table not in- 
volved in said partial query. 

24. Method according to one of claims 1 8 to 23, wherein said gr^h comprises at least one 
branch node having links to at least two other nodes and wherein tables referred to in 
the initial query are related to separate branches deriving from said branch node, 
wherein a partial query is carried out involving the table corresponding to said branch 
node (branch table) and wherein at least one partial query is carried out for one or 
more tables contained in each branch which has the result of the partial query involv- 
ing the branch table as an input. 

25. Method according to one of claims 1 8 to 24 comprising the steps of 

identifying the hub table or hub tables related to tables referred to in the initial 
query. 
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deteimming in said graphical representation of said database, at least for one 
hub table, an optimum graph connecting said hub to all tables which are related 
to said iaib and which are referred to iu the initial query, 
performing consecutive partial queries involving tables which are consecutive 
to each oflier with regard to said optimum graph. 

Method according to one of claims 1 to 25, wherein said step of retrieving unique 
identifiers of said gateway table comprises: 

determining a table that is refared to in the initial query, 
determining, in a graphical representation of said dat^.ase, wherein tabl^ are 

represented as nodes and links between tables as hnes between the nodes, a 

gateway table connected to said table, 

querying said database for one or more indices of the gateway table which are 
related to said table. 

Method according to claim 26. wherein one or more specific entries of said table are 
impUed by a query condition and said database is queried for indices of said gateway 
table which are related to said entry or entries. 

Method according to claim 26 or 27, wherein m said graphical representation, a path 
from said table to said gateway table is estabUshed and said query for said indices is 
performed by querying all tables corresponding to nodes m said graph for the values of 
Mnk keys between flie tables in said graph, starting from the table referred to in the 
query and, given the case, certain entries thereof. 

Method according to claim 28, wherein said path is selected as a shortest path between 
said table and said gateway table according to a predetermined metric. 

Method according to claim 28 or 29, wherein said path is part of or identical to fhe 
gr^h for determming partial queries for retrieving additional mformation from tables 
related to said gateway table. 



Method according to one of clauns 26 to 30, comprising tiie step of: 
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detecmining a unique identifier for one or more rows of the gateway table re- 
lated to said indices, if an index or a group of indices related to the same row of 
the gateway table determined by said step of querying the database does not 
uniquely identify a row of said gateway table. 

Method according to one of claims 1 to 31, wherein partial queries used for evaluating 
the initial query are at least partially created dynamically during the process of said 
evaluation. 

Method according to one of claims 1 to 32, wherein said result.is rq>resented in an 
object oriented representation. • • 

Method according to claim 33, wherein the result of said initial query is expressed as 
an object derived by means of object-relational mapping. 

Method according to one of claims 1 to 34, wherein said evaluation of said query is } 
performed tinder Ibe control of an object manager, said object manager comprising a 
sequence of commands to be executed by a computer system. 

Method according to claim 35, wherein said object manager handles an object which 
represents the schema or part of a sdhema of one or more databases to be queried. 

Method according to claim 35 or 36, wherein said object manager defines classes 
which are dynamically created and instantiated. 

Data processing system for controlling the evaluation of a query involving a relational 
database comprising a relational database management system (RDBMS), said query 
relating to at least one table of said relational database, comprising: 

means for determining a table as a gateway table for evaluating said query, 
means for establishing a relation between a table or tables to be queried and a 
gateway table, 

means for causing the RDBMS to retrieve one or more unique identifiers of 
• said gateway table related to one or more entries in a table to be queried, 
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means for causing the KDBMS to retrieve infoimation from tables to be que- 
ried related to said retrieved unique identifiers of said entry, 
means for providing or causing to be provided a result to said query. 

Data processing system according to claim 38. comprising means for setting certain 
tables in said relational database as predetermined gateway tables for queries to be 
evaluated. 

DatapiocMdng system according to claim 38 or 39, comprising means &r controlling 
to «eontionof amethod according to onoof claims l.,o-37 .by a 4a.a procuring- 
system or data processing systems. 

Computer program causing a computer or computer system, when executed thereon, to 
perfonn the steps of a method according to one of claims 1 to 37. 

A computer readable storage medium, comprising a program according to claim 41 . 
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The invention provides a method of evaluating a query involving a relational database com- 
prising a relational database management system (RDBMS), said query relating to at least one 
table of said relational database, said method comprising determining a table as a gateway 
table for evaluating said query, retrieving one or more unique identifiers of said gateway table 
related to one or more entries in a table to be quaied, retrieving information fiiom tables to be 
queried related to said retrieved unique identifiers of said gateway table, and providing a re- 
sult set conq)rising the retrieved primary keys of the gateway table in relation to said retrieved 
information. The invention further provides a related data processing system and program. 
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